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Abstract 

We prove a tight lower bound for the exponent p for data-dependent Locality- 
Sensitive Hashing schemes, recently used to design efficient solutions for the c-approximate 
nearest neighbor search. In particular, our lower bound matches the bound of p < 
^y^ -|-o(l) for the space, obtained via the recent algorithm from [Andoni-Razenshteyn, 
STOC’15]. 

In recent years it emerged that data-dependent hashing is strictly superior to the 
classical Locality-Sensitive Hashing, when the hash function is data-independent. In 
the latter setting, the best exponent has been already known: for the ii space, the 
tight bound is p = 1/c, with the upper bound from [Indyk-Motwani, STOC’98] and 
the matching lower bound from [O’Donnell-Wu-Zhou, ITCS’ll]. 

We prove that, even if the hashing is data-dependent, it must hold that p > 
2 ^^^—o(l). To prove the result, we need to formalize the exact notion of data-dependent 
hashing that also captures the complexity of the hash functions (in addition to their 
collision properties). Without restricting such complexity, we would allow for obvi¬ 
ously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such 
solutions, we require our hash functions to be succinct. This condition is satisfied by 
all the known algorithmic results. 


*Work done in part while the author was at the Simons Institute for the Theory of Computing, Berkeley 
University. 





1 Introduction 


We study lower bounds for the high-dimensional nearest neighbor search problem, which is a 
problem of major importance in several areas, such as databases, data mining, information 
retrieval, computer vision, computational geometry, signal processing, etc. This problem 
suffers from the “curse of dimensionality” phenomenon: either space or query time are 
exponential in the dimension d. To escape this curse, researchers proposed approximation 
algorithms for the problem. In the (c, r)-approximate near neighbor problem, the data 
structure may return any data point whose distance from the query is at most cr, for an 
approximation factor c > 1 (provided that there exists a data point within distance r from the 
query). Many approximation algorithms are known for this problem: e.g., see surveys |Sam061 
IAin8llAndn9llWSS,)14] . 

An influential algorithmic technique for the approximate near neighbor search (ANN) is 
the Locality Sensitive Hashing (LSH) |IM981 [HPIM12] . The main idea is to hash the points 
so that the probability of collision is much higher for points that are close to each other 
(at distance < r) than for those which are far apart (at distance > cr). Given such hash 
functions, one can retrieve near neighbors by hashing the query point and retrieving elements 
stored in buckets containing that point. If the probability of collision is at least pi for the 
close points and at most p 2 for the far points, the algorithm solves the (c, r)-ANN using 
essentially 0{n^^'^/pi) extra space and 0{dnP/pi) query time, where p = [HPIM12] . 

The value of the exponent p thus determines the “quality” of the LSH families used. 

Consequently, a lot of work focused on understanding the best possible value p for LSH, 
including the sequence of upper bounds |IM98l IDIIMOdl lAIOb] and lower bounds |MNPn7l 
lOWZll] . Overall, they established the precise bounds for the best value of p\ for the tight 
bound is p = i ± o(l). In general, for where 1 < p < 2, the tight bound is p = ^ ± o(l). 

Surprisingly, it turns out there exist more efficient ANN data structures, which step out¬ 
side the LSH framework. Specihcally, [AINRldl lARlSj design algorithms using the concept 
of data-dependent hashing, which is a randomized hash family that itself adapts to the actual 
given dataset. In particular, the result of [AR15] obtains an exponent p = 
the £p space, thus improving upon the best possible LSH exponent essentially by a factor of 
2 for both £i and £2 spaces. 

Our result. Here we prove that the exponent p = 2cP-i from [AR15j is essentially optimal 
even for data-dependent hashing, and cannot be improved upon. Stating the precise theorem 
requires introducing the precise model for the lower bound, which we accomplish below. For 
now, we state our main theorem informally: 
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Theorem 1.1 (Main, informal). Any data-dependent hashing scheme for ii that achieves 
probabilities pi and p 2 must satisfy 


P 


log 1/Pi ^ 1 

log l/p 2 “ 2c - 1 


o(l)> 


as long as the description complexity of the hash functions is sufficiently small. 

An immediate conseguence is that p > 2 cP-i ~ with 1 < p < 2, using the 

embedding from \LLR95^ . 


1.1 Lower Bound Model 

To state the precise theorem, we need to formally describe what is data-dependent hashing. 
First, we state the dehnition of (data-independent) LSH, as well as dehne LSH for a hxed 
dataset P. 

Definition 1.2 (Locality-Sensitive Hashing). We say that a hash family l-i over {0, is 
(ri, r 2 ,pi,p 2 )-sensitive, if for every u,v E {0,1}'^ one has: 

• if \\u — n||i < ri, then Pr [h{u) = h{v)] > pi; 

h^'H 

• if \\u — f 111 > r 2 , then Pr [h{u) = h{v)] < p 2 . 

h^'H 

We now rehne the notion of data-independent LSH, where we require the distribution to 
work only for a particular dataset P. 

Definition 1.3 (LSH for a dataset P). A hash family V, over {0, l}"^ is said to be (ri, r 2 ,Pi,P 2 )- 
sensitive for a dataset P C {0,1}'^, if: 

• for every v G {0,1}'^ and every u E P with ||m — t>||i < ri one has 

PT [h{u) = h{v)] > pi; 


• Pr [h{u) = h{y) and ||m — w||i > r 2 ] < p 2 - 

U,V'^P 

Note that the second dehnition is less stringent than the hrst one: in fact, if all the 
points in a dataset are at distance more than r 2 from each other, then an LSH family "H is 
also LSH for P, but not necessarily vice versa! Furthermore, in the second dehnition, we 
require the second property to hold only on average (in contrast to every point as in the hrst 
dehnition). This aspect means that, while Dehnition 11.31 is certainly necessary for an ANN 
data structure, it is not obviously sufficient. Indeed, the algorithm from |AR15j requires 
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proving additional properties of their partitioning scheme, and in particular analyzes triples 
of points. Since here we focus on lower bounds, this aspect will not be important. 

We are now ready to introduce data-dependent hashing. 

Data-dependent hashing. Intuitively, a data-dependent hashing scheme is one where we 
can pick the family % as a function of P, and thus be sensitive for P. An obvious solution 
would hence be to choose P to consist of a single hash function h which is just the Voronoi 
diagram of the dataset P: it will be (r, cr, 1, 0)-sensitive for P and hence p = 0. However 
this does not yield a good data strcture for ANN since evaluating such a hash function on a 
query point q is as hard as the original problem! 

Hence, ideally, our lower bound model would require that the hash function is compu¬ 
tationally efficient to evaluate. We do not know how to formulate such a condition which 
would not make the question as hard as circuit lower bounds or high cell-probe lower bounds, 
which would be well beyond the scope of this paper. 

Instead, we introduce a condition on the hash family that can be roughly summarized 
as “hash functions from the family are succinct”. For a precise definition and discussion see 
below. For now, let us point out that all the known algorithmic results satisfy this condition. 

Finally, we are ready to state the main result formally. 

Theorem 1.4 (Main theorem, full). Fix the approximation c> 1 to be a constant. Suppose 
the dataset P lives in the Hamming space {0,1}'^, where the dataset size n = |P| is such that 
d = a;(log?7,) as n tends to infinity. There exist distance thresholds r and (c — o(l))r with 
the following property. 

Suppose there exist T hash functions {hi}i<i<T over {0,1}'^ such that for every n-point 
dataset P there exists a distribution Tip over hi’s such that the corresponding hash family is 
(r, (c — o{l))r,pi,p 2 )-sensitive for P, where 0 < pi,p 2 < 0.99. For any such data-dependent 
scheme it must either hold that p = — o(l) or that 

Interpreting Theorem 11.41 Let us explain the conditions and the conclusions of Theo¬ 
rem [T31 in more detail. 

We start by interpreting the conclusions. As explained above, the bound p = 1°^ > 

2 ^^ — o(l) directly implies the lower bound on the query time for any scheme that 

is based on data-dependent hashing. 

The second bound jg a little bit more mysterious. Let us now explain 

what it means precisely. The quantity log T can be interpreted as the description complexity 
of a hash function sampled from the family. At the same time, if we use a family with 
collision probability pi for close points, we need at least 1/pi hash tables to achieve constant 
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probability of success. Since in each hash table we evaluate at least one hash function, the 
quantity can be interpreted as the lower bound for the total space occupied by hash 
functions we evaluate during each query. In all known constructions of (data-independent or 
data-dependent) LSH families |IM98l IDIIMOdl IAin6[ ITTOTl lAINRldl IAR,15] the evaluation 
time of a single hash function is comparable to the space it occupies (for discussion regarding 
why is it true for |AR15j . see Appendix!^, thus, under this assumption, we can not achieve 
query time unless p > — o(l). On the other hand, we can achieve p = 0 by 

considering a data-dependent hash family that consists only of the Voronoi diagram of a 
dataset (trivially, pi = 1 and P 2 = 0 for this case), thus the conclusion can 

not be omitted in general^]. Note that the “Voronoi diagram family” is very slow to evaluate: 
to locate a point we need to solve an instance of exact Nearest Neighbor Search, that is 
unlikely to be possible to do in strongly sublinear time. Thus, this family satishes the above 
assumption “evaluation time is comparable to the space”. 

We now turn to interpreting the conditions. We require that d = a;(logn)l^. We conjecture 
that this requirement is necessary and for d = O(logn) there is an LSH family that gives 
a better value of p (the improvement, of course, would depend on the hidden constant in 
the expression d = O(logn)). Moreover, if one steps outside the pure data-dependent LSH 
framework, in a recent paper [BDGL15] an improved data structure for ANN for the case 
d = 0(log?7,) is presented, which achieves an improvement similar to what we conjectured 
above. 


1.2 Techniques and Related Work 

There are two components to our lower bound, i.e.. Theorem 11.41 

The hrst component is a lower bound for data-independent LSH for a random dataset. 
We show that in this case, we must have p > — o(l). This is in contrast to the lower 

bound of [OWZllj . who achieve a higher lower bound but for the case when the (far) points 
are correlated. Our lower bound is closer in spirit to [MNPOTj . who also consider the case 
when the far points are random uncorrelated. In fact, this component is a strengthening of 
the lower bound from [MNPOTj . and is based crucially on an inequality proved there. 

We mention that, in [DublOj . Dubiner has also considered the setting of a random dataset 
for a related problem—finding the closest pair in a given dataset P. Dubiner sets up a certain 
related “bucketing” model, in which he conjectures the lower bound, which would imply a 

^For the Voronoi diagram, logT > n, since to specify it, one needs at least n bits. 

^When ANN for the general dimension d is being solved, one usually first performs some form of the 
dimensionality reduction [JL841 IKOROOl IDG03| . Since at this stage we do not want distances to be distorted 
by a factor more than 1 -|- o(l), the target dimension is precisely w(logn). So, the assumption d = a;(logn) 
in Theorem O in some sense captures a truly high-dimensional case. 
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p > lower bound for data-independent LSH for a random set. Dubiner verifies the 
conjecture computationally and claims it is proved in a different manuscriptl^ 

We also point out that, for the £2 case, the optimal data-independent lower bound 
P > 2J-1 ~ follows from a recent work jAIK'*~15] . In fact, it shows almost exact 
trade-off between pi and p 2 (not only the lower bound on p = logli/p^) )- Unfortunately, the 
techniques there are really tailored to the Euclidean case (in particular, a powerful isoperi- 
metric inequality of Feige and Schechtman is used |FSn2j ) and it is unclear how to extend it 
to £ 1 , as well as, more generally, to for 1 < p < 2. 

Our second component focuses on the data-dependent aspect of the lower bound. In 
particular, we prove that if there exists a data-dependent hashing scheme for a random 
dataset with a better p, then in fact there is also a data-independent such hashing scheme. 
To accomplish this, we consider the “empirical” average pi and p 2 for a random dataset, and 
prove it is close to the average pi and p 2 , for which we can deduce a lower bound from the 
hrst component. 

In terms of related work, we also must mention the papers of |PTW08l IPTWlOj . who 
prove cell-probe lower bounds for the same problem of ANN. In particular, their work utilizes 
the lower bound of [MNPOTj as well. Their results are however incomparable to our results: 
while their results are unconditional, our model allows us to prove a much higher lower 
bound, in particular matching the best algorithms. 

2 Data-independent Lower Bound 

In this section we prove the first component of Theorem 11.41 Overall we show a lower bound 
of p > 2 ^ + 0 ( 1 ) for data-independent hash families for random datasets. Our proof is 
a strengthening of [MNPOTj . The final statement appears as Corollary 12.71 In the second 
component, we will use a somewhat stronger statement. Lemma [2.61) . 

For u G {0,1}'^ and non-negative integer k define a random variable Wk{u) distributed 
over {0,1}'^ to be the resulting point of the standard random walk of length k that starts 
in u (at each step we flip a random coordinate). 

We build on the following inequality from [MNPOTj that is proved using Fourier analysis 
on {0, l}"^. 

Lemma 2.1 f [MNP07] L For every hash function h: {0, — )■ Z and every odd positive 

^This manuscript does not appear to be available at the moment of writing of the present paper. 
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integer k one has: 


exp{2k / d) — 1 

Pr [h{u) = h{u)] < Pr [h{u) = h{u)] exp( 2 fc/d)+i _ 

V'^Wk (u) 

The above inequality can be thought of as a lower bound on p already. In particular, 
the left-hand-side quantity is probability of collision of a point and a point generated via a 
random walk of length k from it. The right-hand side corresponds to collision of random 
independent points, which are usually at distance d/2. One already obtain a lower bound 
on p by considering k = d/2c. 

To strengthen the lower bound of [MNP07j . we analyze carefully the distance between the 
endpoints of a random walk in the hypercube. In particular the next (somewhat folklore) 
lemmas show that this distance is somewhat smaller than the (trivial) upper bound of k. 
Denote X^. the distance ||hl4(M) — m||i. Note that the distribution of does not depend on 
a particular starting point u. 


Lemma 2.2. For every k one has 


EI-YJ = 5 • 



Proof. We have that 


E[Xk I Xk-i = t] = Pr[Xfc = f - 1] ■ (f - 1) -1- Pr[Xfc = t + 1] ■ {t + 1) 

= t.(t_i) + (i_t).(i + i) = (i_l).t+i. 

Thus, 

E[Xfc]=(^l-^)-E[X,_i] + l. 

Since E[Xo] = 0, we obtain that 


k-l 


EW = E 1- 


i=0 


d 




□ 


We can now prove that the value of X^ indeed concentrates well around the expectation, 
using concentration inequalities. 
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Lemma 2.3. For every k and every t > 0 one has 


Pr 


Xk > E[Xfc] + t ■ Vk 


< e 


-t^l4 


Proof. For t > 1 define Yt to be the index of the coordinate that got flipped at time t. 
Obviously, we have that Xk is a (deterministic) function of hi,... Y^- Furthermore, changing 
one Yt changes X^ by at most 2. Hence we can apply the McDiarmid’s inequality to X^ = 

/(Fi,...n): 

Theorem 2.4 (McDiarmid’s inequality). Let Yi,... Y), be independent random variables and 
X = f{Yi,... Y]f) such that changing variable Yt only changes the value by at most Ct- Then 
we have that 

PrlA' > E [A] + < exp ■ 

\ Xi=l Cj / 

Hence we obtain that 


Pr [Xk > E[Xfc] + e] < exp 



Substituting e = t ■ \/fc, we get the result. □ 

Note that choosing k ^ ^ ■ In ^ will mean that distance Xk = ||Hfc(w) — m||i is now 
around d/2c, i.e., we can actually use random walks longer than the ones considered in 
|MNP07j . Indeed, from Lemma 12.21 and Lemma 12.31 one can immediately conclude the 
following corollary. 


Corollary 2.5. Let c > 1 be a fixed constant. Suppose that 7 = 7(d) > 0 is such that 
7 = 0 ( 1 ) as d —)■ cx). Then, there exists a = a{d) such that 


a{d) 




that satisfies the following: 


Pr 


Xn.d > 


A' 




We are now ready to prove the main lemma of this section, which will be used in the 
later section on data-dependent hashing. We need to introduce two more dehnitions. We 
dehne “average pi” as: 


C(c, d, a, h) 


Pr 


h{u) = h{v) 


In — fill < 


2c 
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for c > 1, positive integer d, a > 0 and a hash function h: {0,1}'^ —?■ Z. Similarly, we dehne 
the “average P 2 ” as 


r]{d,P,h) 


Pr 


h{u) = h{v), Ij-u — will > 



for positive integer d, (3 > 0 and a hash function h: {0,1}'^ —)■ Z. 

Lemma 2.6. Let c> 1 be a fixed constant. Suppose that 7 = 7 ( 0 ?) > 0 is such that 7 = o(l) 
as d ^ 00 . Then, there exist a = a{d), = fi{d) and p = p{d) such that: 

« = ^ 

P = 0 ^( 1 ), 

P = - Ocjl) 

as d 00 such that, for every d and every hash function h\ {0, l}'^ —?■ Z, one has 


Cic,d,a,h) <p{d,fi,hy + 2-'f-fi 


Proof. We use Lemma 12751 to choose a = | • In ^ + Oc,^(l) such that a • d is an odd integer 
and 


Pr 


X > ^ 

-^a-d P TT 

2c 


< 


2-rd 


( 1 ) 


We can choose /3 = 0 .^( 1 ) so that 


Pr 


|m — n|| < — /d) ■ d 


< 


2-rd 


( 2 ) 


(This follows from the standard Chernoff-type bounds. ( 
Now we apply Lemma 12.11 and get 


Pr [h{u) = h{v)] < Pr [h{u) = h{v)Y, 

y^Wc.d{u) 


(3) 


where 

exp( 2 a)-l 1 

P =- /o ^ I 1 = h-h “ W7 1 • 

exp( 2 a) + 1 2 c — 1 

Finally, we combine ([2]), ([I]) and ([2]), and get the desired inequality. □ 


The following corollary shows how the above lemma implies a lower bound on data- 

















independent LSH. 


Corollary 2.7. For every c > 1 and 7 = 7 (d) > 0 such that 7 = o(l) there exists jd = I3{d) > 
0 with (3 = 0 ^( 1 ) such that if FL is a data-independent (d/( 2 c), ( 1/2 — f3)d,pi,p2)-sensitive 
family, then 

Pi < + 2-^-''. 

Proof. We observe that for every a,/3 > 0: 

• Pi < E [C(c,d,a,h)]; 

h^n 

• P2> E [v{d,^,h)]. 

h^n 

Now we apply Lemma 12.61 together with the following application of Jensen’s inequality: 

E |r,K/3,/!)1< E W<i,/3,ft)f, 


since 0 < p < 1 . 


□ 


3 Data-Dependent Hashing 

We now prove the second component of the main Theorem 11.41 proof. In particular we show 
that a very good data-dependent hashing scheme would refute Lemma YIM from the previous 
section. 


3.1 Empirical Probabilities of Collision 

For a particular dataset P, we will be interested in empirical probabilities pi,P 2 — i-e., the 
equivalents of for a given set P — dehned as follows. Let 0 < 5{d) < 1/3 be some 
function. Let P be a random set of points from {0,1}'’* of size The empirical versions 

of C and 7] with respect to P are: 


C{c,d,a,h,P) := 
r]{d,l3,h,P) := 


Pr 

Ur^P 


h{u) = h{v] 


\u 


n d' 


Pr 

U,Vr^P 



h{v), IIm — n||i > 



We now what to prove that, for a random dataset P, the empirical C,,pi are close to the 
true averages. For this we will need the following auxiliary lemma. 
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Lemma 3.1. Let M he an n x n symmetric matrix with entries from [0; 1] and average e. 
Let M' he a principal x suhmatrix of M sampled uniformly with replacement. Then, 
for every 6 > 0, the prohahility that the maximum of the average over M' and 6 does not lie 
in [1/2; 2] ■ max{e, 9} is at most 


Proof. We need the following version of Bernstein’s inequality. 

Lemma 3.2. Suppose that Xi,... ,Xn are i.i.d. random variables that are distributed over 
[0; 1]. Suppose that E[W] = e. Then, for every 0 < 9 < 1, one has 



f 1 ^ 1 

n 1 


Pr 

max < — > Xi, 9>E 

[ ) 


■ max{£, 9} 


<2—Q{9n) 


We just apply Lemma [3.21 and take union bound over the rows of M'. □ 

The following two lemmas are immediate corollaries of Lemma 13.21 and Lemma 13.11 re¬ 
spectively. 


Lemma 3.3. For every c > 1, a > 0, positive integer d, 9 > 0 and a hash function 
h: {0,1}'’* —)■ Z, one has 


Pr 

p 


max{^(c, d, a, h, P), 9} G [1/2; 2] • max{/'(c, d, a, h), 9} 


> 




Lemma 3.4. For every P > 0, positive integer d, 9 > 0 and a hash function h: {0,1}'’* —?■ Z, 
one has 


Pr [max{f/((i, P, h, P),9} G [1/2; 2] ■ max{r7((i, P, h), 6*}] > 1 


2<5(rf)-rf . 


3.2 Proof of Theorem 11.41 

We are hnally ready to complete the proof of the main result, Theorem 11.41 

Let us hrst assume that pi = o(l) and then show how to handle the general case. Suppose 
that n = \P\ = 2^'^. By the assumption of Theorem 11.41 5 = o(l). Let {hi, ^2 ,..., hr} be a 
set of hash functions. We can assume that since otherwise we are done. Let 

us £x 7 = 7((i) > 0 such that 7 = o(l) and 2“'^'’* We can do this, since if pi = 

then 1/pi = and the desired statement is true. Then, by Lemmathere is a = a{d) 

and P = P{d) = 07(1) such that for every 1 < i < T 

C(c, d, a, hi) < ri{d, P, + 2-^'^. (4) 
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Let us choose 0 > 0 such that 

and 6 <C pi. We can do it since, by the above assumption, _ 

Then, from Lemma 1X51 and Lemma [231 we get that, with high probability over the choice 
of P, one has for every 1 < i < T: 

• max{('(c, d, a, hi), 9} G [1/2; 2] • max{/'(c, d, a, hi), 9}; 

• ma,x{f]{d, (3, hi),9} & [1/2; 2] ■ max{r7((i,/3, h*), 6*}. 

Suppose these conditions hold and assume there exist a distribution P over [T] such that 
the corresponding hash family is {d/2c, (1/2 — /3((i))d,pi,p2)-sensitive for P. Then, 


Pi < E [C(c, d, a, hi)] < E [max{C(c, d, a, hi), 9}] <2- E [max{C(c, d, a, hi), 9}] 


\[C{c,d,a,hi)] + 9. (5) 


Similarly, 


P 2 > P [ri{d,l3,hi)] > E [mQx{rj{d,l3,hi),9}] - 9> - 


E [ma,x{p{d, (3, hi) ,9}] — 9 

i^T) 



.\[v{d,/3,h^)] 


9. 


( 6 ) 


Averaging (jll) and applying Jensen’s inequality, we have 

E [C(c,ci.a.ft.)] < E [,,(d./?,fc.)]3iT-»(i) + (7) 

Thus, substituting (JSl) and ([6]) into (171) 

< (2(p2 + 9 ))^-°^^^ + 2 -^\ 

which proves the theorem, since 9 <C pi, 2~'^^ -C pi^^\ and pi = o(l). 

Now let us deal with the case pi = 12(1). This can be reduced to the case pi = o(l) by 
choosing a slowly-growing super constant k and replacing the set of T functions with the set 
of tuples of length k. This replaces pi and p2 with p\ and P2, respectively. In the same 
time, we choose k so that T' = still satisfy the hypothesis of the theorem. Then, we just 
apply the above proof. 
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A Upper Bounds Are in the Model 


In this section, we show how the data-dependent hash family from [AR,15j hts into the model 
of the lower bound from Section 11.11 

Let us briefly recall the hash family construction from |AR15] . For simplicity, assume that 
all points and queries lie on a sphere of radius R cr. First, consider a data-independent 
hash family from |AINR,14] IAR,15j . called Spherical LSH. It gives a good exponent p < 
2 ^ 2 _I + o(l) for distance thresholds r vs ^/2R (the latter corresponds to a typical distance 
between a pair of points from the sphere). The main challenge that arises is how to handle 
distance thresholds r vs cr, where the latter may be much smaller than -\/2R. Here comes 
the main insight of |AR15j . 
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We would like to process the dataset so that the distance between a typical pair of data 
points is around \/2R, so to apply Spherical LSH. To accomplish this, we remove all the 
clusters of radius (\/2 — e)R that contain lots of points (think of e > 0 being barely sub¬ 
constant). We will treat these clusters separately, and will focus on the remainder of the 
pointset for now. So for the remainder of the pointset, we just apply the Spherical LSH to 
it. This scheme turns out to satisfy the dehnition of the data-dependent hash family for 
the remaining points and for distance thresholds r vs. cr; in particular, the hash function 
is sensitive for the remaining set only! The intuition is that, for any potential query point, 
there is only a small number of data points within distance (\/2 — e)R — otherwise, they 
would have formed yet another cluster, which we would have removed — and the larger 
distances are handled well by the Spherical LSH. Thus, for a typical pair of data points, 
Spherical LSH is sensitive for P (see Dehnition 1 1.3p . 

How does [AR15j deal with the clusters? The main observation is that one can enclose 
such a cluster in a ball of radius (1 — D(£^))i?, which intuitively makes our problem a little bit 
simpler (after we reduce the radius enough times, the problem becomes trivial). To answer 
a query, we query every cluster, as well as one part of the remainder (partitioned by the 
Spherical LSH). This can be shown to work overall, in particular, we can control the overall 
branching and depth of the recursion (see |AR15] for the details). 

For both the clusters, as well as the parts obtained from the Spherical LSH, the algorithm 
recurses on the obtained point subsets. The overall partitioning scheme from |AR,15j can 
be seen as a tree, where the root corresponds to the whole dataset, and every node either 
corresponds to a cluster or to an application of the Spherical LSH. One nuance is that, in 
different parts of the tree the Spherical LSH partitioning obtains different pi,P 2 (depending 
on R). Nonetheless, each time it holds that pi > P 2 for p < 2 J -1 + Hence, a node 

terminates as a leaf once its accumulated p 2 (product of p2’s of “Spherical LSH” nodes along 
the path from the root) drops below 1/n. 

We now want to argue that the above algorithm can be recast in the framework of data- 
dependent hashing as per Dehnition 11.31 We consider a subtree of the overall tree that 
contains the root and is dehned as follows. Fix a parameter I = (it is essentially the 

target p 2 of the partition). We perform DFS of the tree and cut the tree at any “Spherical 
LSH” node where the cumulative p 2 drops below 1. This subtree gives a partial partition 
of the dataset P as follows: for “Spherical LSH” nodes we just apply the corresponding 
partition, and for cluster nodes we “carve” them in the random order. It turns out that if 
we choose I = carefully, the partition will satisfy Dehnition 11.31 and the preconditions 

of Theorem 11.41 In particular, the description complexity of the resulting hash function is 
n°d). 
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Let us emphasize that, while Definition 11.31 is certainly necessary for an ANN data struc¬ 
ture based on data-dependent hashing, it is not sufficient. In fact, |AR15] prove additional 
properties of the above partitioning scheme, essentially because the “p2 property” is “on 
average” one (thus, we end up having to understand how this partitioning scheme treats 
triples of points). 
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